Data Analysis Project: Semi-Supervised Discovery of Named Entities and Relations from the Web
نویسندگان
چکیده
This project studies semi-supervised discovery of named entities, relational entities and prepositional phrase attachments within a read-the-web framework. Meanings of an entity can be improvised and updated faster in the internet world than printed references. The main idea of this project is to study the feasibility of characterizing entities by web content directly. The approach is that contextual words around an entity on web pages are first extracted and converted into a Bag-Of-Word (BOW) representation. We then apply several supervised and semi-supervised learning methods on top of these contextual words for several well known research problems: Named Entities Recognition, Relation Extraction and Prepositional Phrase Attachment.
منابع مشابه
Extracting Arabic Relations from the Web
There is a vast amount of unstructured Arabic information on the Web, this data is always organized in semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that extracts binary relations between two Arabic named entities from the Web. Several works have been performed for relation extraction from Latin texts and as far as we know, there isn’t any ...
متن کاملLearning on Partial-Order Hypergraphs
Graph-based learning methods explicitly consider the relations between two entities (i.e., vertices) for learning the prediction function. They have been widely used in semi-supervised learning, manifold ranking, and clustering, among other tasks. Enhancing the expressiveness of simple graphs, hypergraphs formulate an edge as a link to multiple vertices, so as to model the higher-order relation...
متن کاملUsing Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web
Many errors produced by unsupervised and semi-supervised relation extraction (RE) systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components, instead relying on general-purpose shallow parsing. Such systems have greater applicability, because they are able to extract r...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملSemi-supervised Statistical Inference for Business Entities Extraction and Business Relations Discovery
The sheer volume of user-contributed data on the Internet has motivated organizations to explore the collective business intelligence (BI) for improving business decisions making. One common problem for BI extraction is to accurately identify the entities being referred to in user-contributed comments. Although named entity recognition (NER) tools are available to identify basic entities in tex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009